spherical coordinate
Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density
Balestriero, Randall, Ballas, Nicolas, Rabbat, Mike, LeCun, Yann
Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box. JEPAs combine two objectives: (i) a latent-space prediction term, i.e., the representation of a slightly perturbed sample must be predictable from the original sample's representation, and (ii) an anti-collapse term, i.e., not all samples should have the same representation. While (ii) is often considered as an obvious remedy to representation collapse, we uncover that JEPAs' anti-collapse term does much more--it provably estimates the data density. In short, any successfully trained JEPA can be used to get sample probabilities, e.g., for data curation, outlier detection, or simply for density estimation. Our theoretical finding is agnostic of the dataset and architecture used--in any case one can compute the learned probabilities of sample $x$ efficiently and in closed-form using the model's Jacobian matrix at $x$. Our findings are empirically validated across datasets (synthetic, controlled, and Imagenet) and across different Self Supervised Learning methods falling under the JEPA family (I-JEPA and DINOv2) and on multimodal models, such as MetaCLIP. We denote the method extracting the JEPA learned density as {\bf JEPA-SCORE}.
Supplementary 1 Outline The following is the outline of this supplementary material: In appendix A, we present our upper-bound experiments which complements the experiments
In Figure 1, we show a t-SNE plot of the distributions in four datasets used in this work. In Table 1 and Table 2 of this supplementary material, we present our upper-bound results that complement the experiments shown in Table 2 and Table 3 of the main manuscript. In addition, our proposed method still outperforms other baselines, for both settings (i.e., Setting 1 and 2 described in Sec 4.2 in our main manuscript), which demonstrates the efficacy of our For more implementation details, please refer to [4].
A 4D Radar Camera Extrinsic Calibration Tool Based on 3D Uncertainty Perspective N Points
Cao, Chuan, Wang, Xiaoning, Xi, Wenqian, Zhang, Han, Chen, Weidong, Wang, Jingchuan
A 4D Radar Camera Extrinsic Calibration T ool Based on 3D Uncertainty Perspective N Points. Abstract -- 4D imaging radar is a type of low-cost millimeter-wave radar(costing merely 10-20 % of lidar systems) capable of providing range, azimuth, elevation, and Doppler velocity information. Accurate extrinsic calibration between millimeter-wave radar and camera systems is critical for robust multimodal perception in robotics, yet remains challenging due to inherent sensor noise characteristics and complex error propagation. This paper presents a systematic calibration framework to address critical challenges through a spatial 3d uncertainty-aware PnP algorithm (3DUPnP) that explicitly models spherical coordinate noise propagation in radar measurements, then compensating for non-zero error expectations during coordinate transformations. Finally, experimental validation demonstrates significant performance improvements over state-of-the-art CPnP baseline, including improved consistency in simulations and enhanced precision in physical experiments.
Search for Z/2 eigenfunctions on the sphere using machine learning
Haydys, Andriy, Salm, Willem Adriaan
We use machine learning to search for examples of Z/2 eigenfunctions on the 2-sphere. For this we created a multivalued version of a feedforward deep neural network, and we implemented it using the JAX library. We found Z/2 eigenfunctions for three cases: In the first two cases we fixed the branch points at the vertices of a tetrahedron and at a cube respectively. In a third case, we allowed the AI to move the branch points around and, in the end, it positioned the branch points at the vertices of a squashed tetrahedron.
Space filling positionality and the Spiroformer
Maurin, M., Evangelista-Alvarado, M. Á., Suárez-Serrato, P.
Transformers excel when dealing with sequential data. Generalizing transformer models to geometric domains, such as manifolds, we encounter the problem of not having a well-defined global order. We propose a solution with attention heads following a space-filling curve. As a first experimental example, we present the Spiroformer, a transformer that follows a polar spiral on the $2$-sphere.
Feature Geometry for Stereo Sidescan and Forward-looking Sonar
Norman, Kalin, Mangelson, Joshua G.
-- In this paper, we address stereo acoustic data fusion for marine robotics and propose a geometry-based method for projecting observed features from one sonar to another for a cross-modal stereo sonar setup that consists of both a forward-looking and a sidescan sonar . Our acoustic geometry for sidescan and forward-looking sonar is inspired by the epipolar geometry for stereo cameras, and we leverage relative pose information to project where an observed feature in one sonar image will be found in the image of another sonar . Additionally, we analyze how both the feature location relative to the sonar and the relative pose between the two sonars impact the projection. From simulated results, we identify desirable stereo configurations for applications in field robotics like feature correspondence and recovery of the 3D information of the feature. Field robotic applications, such as localization and mapping, in underwater environments face significant challenges due to the complex and dynamic nature of the marine domain.
Riemannian Manifold Learning for Stackelberg Games with Neural Flow Representations
Liu, Larkin, Rasul, Kashif, Chao, Yutong, Etesami, Jalal
We present a novel framework for online learning in Stackelberg general-sum games, where two agents, the leader and follower, engage in sequential turn-based interactions. At the core of this approach is a learned diffeomorphism that maps the joint action space to a smooth Riemannian manifold, referred to as the Stackelberg manifold. This mapping, facilitated by neural normalizing flows, ensures the formation of tractable isoplanar subspaces, enabling efficient techniques for online learning. By assuming linearity between the agents' reward functions on the Stackelberg manifold, our construct allows the application of standard bandit algorithms. We then provide a rigorous theoretical basis for regret minimization on convex manifolds and establish finite-time bounds on simple regret for learning Stackelberg equilibria. This integration of manifold learning into game theory uncovers a previously unrecognized potential for neural normalizing flows as an effective tool for multi-agent learning. We present empirical results demonstrating the effectiveness of our approach compared to standard baselines, with applications spanning domains such as cybersecurity and economic supply chain optimization.
Shrinking: Reconstruction of Parameterized Surfaces from Signed Distance Fields
Yin, Haotian, Musialski, Przemyslaw
Abstract--We propose a novel method for reconstructing explicit parameterized surfaces from Signed Distance Fields (SDFs), a widely used implicit neural representation (INR) for 3D surfaces. While traditional reconstruction methods like Marching Cubes extract discrete meshes that lose the continuous and differentiable properties of INRs, our approach iteratively contracts a parameterized initial sphere to conform to the target SDF shape, preserving differentiability and surface parameterization throughout. Each step Implicit Neural Representations (INRs) [1] have become involves remeshing to maintain uniform distribution, ensuring popular 3D models in computer graphics, with applications surface continuity and smooth parameterization. in scientific simulation, photogrammetry, generative modeling, Our experiments demonstrate that this approach not only and inverse physics [2]. INRs encode continuous signals generates differentiable parameterizations but also achieves via neural networks that map spatial coordinates to signal competitive reconstruction quality compared to mainstream values [3], offering advantages like efficient storage, smooth methods. Our contributions include: (1) Introducing a interpolations, and differentiable features, surpassing traditional shrinking-based method for extracting high-quality meshes grid-based methods [4].
Geotokens and Geotransformers
In transformer architectures, position encoding primarily provides a sense of sequence for input tokens. While the original transformer paper's method has shown satisfactory results in general language processing tasks, there have been new proposals, such as Rotary Position Embedding (RoPE), for further improvement. This paper presents geotokens, input components for transformers, each linked to a specific geological location. Unlike typical language sequences, for these tokens, the order is not as vital as the geographical coordinates themselves. To represent the relative position in this context and to keep a balance between the real world distance and the distance in the embedding space, we design a position encoding approach drawing from the RoPE structure but tailored for spherical coordinates.